11 research outputs found

    Convolutional Neural Fabrics

    Get PDF
    Despite the success of CNNs, selecting the optimal architecture for a given task remains an open problem. Instead of aiming to select a single optimal architecture, we propose a "fabric" that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyper-parameters of a fabric are the number of channels and layers. While individual architectures can be recovered as paths, the fabric can in addition ensemble all embedded architectures together, sharing their weights where their paths overlap. Parameters can be learned using standard methods based on back-propagation, at a cost that scales linearly in the fabric size. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels dataset.Comment: Corrected typos (In proceedings of NIPS16

    Coordinated Local Metric Learning

    Get PDF
    International audienceMahalanobis metric learning amounts to learning a linear data projection, after which the L2 metric is used to compute distances. To allow more flexible metrics, not restricted to linear projections, local metric learning techniques have been developed. Most of these methods partition the data space using clustering, and for each cluster a separate metric is learned. Using local metrics, however, it is not clear how to measure distances between data points assigned to different clusters. In this paper we propose to embed the local metrics in a global low-dimensional representation, in which the L2 metric can be used. With each cluster we associate a linear mapping that projects the data to the global representation. This global representation directly allows computing distances between points regardless to which local cluster they belong. Moreover, it also enables data visualization in a single view, and the use of L2 based efficient retrieval methods. Experiments on the Labeled Faces in the Wild dataset show that our approach improves over previous global and local metric learning approaches

    SPDF: Sparse Pre-training and Dense Fine-tuning for Large Language Models

    Full text link
    The pre-training and fine-tuning paradigm has contributed to a number of breakthroughs in Natural Language Processing (NLP). Instead of directly training on a downstream task, language models are first pre-trained on large datasets with cross-domain knowledge (e.g., Pile, MassiveText, etc.) and then fine-tuned on task-specific data (e.g., natural language generation, text summarization, etc.). Scaling the model and dataset size has helped improve the performance of LLMs, but unfortunately, this also lead to highly prohibitive computational costs. Pre-training LLMs often require orders of magnitude more FLOPs than fine-tuning and the model capacity often remains the same between the two phases. To achieve training efficiency w.r.t training FLOPs, we propose to decouple the model capacity between the two phases and introduce Sparse Pre-training and Dense Fine-tuning (SPDF). In this work, we show the benefits of using unstructured weight sparsity to train only a subset of weights during pre-training (Sparse Pre-training) and then recover the representational capacity by allowing the zeroed weights to learn (Dense Fine-tuning). We demonstrate that we can induce up to 75% sparsity into a 1.3B parameter GPT-3 XL model resulting in a 2.5x reduction in pre-training FLOPs, without a significant loss in accuracy on the downstream tasks relative to the dense baseline. By rigorously evaluating multiple downstream tasks, we also establish a relationship between sparsity, task complexity and dataset size. Our work presents a promising direction to train large GPT models at a fraction of the training FLOPs using weight sparsity, while retaining the benefits of pre-trained textual representations for downstream tasks.Comment: Accepted to Uncertainty in Artificial Intelligence (UAI) 2023 Conference; 13 pages, 4 figures (Main Paper) + 5 pages (Supplementary Material

    A Prospective Multicenter Study Evaluating Learning Curves and Competence in Endoscopic Ultrasound and Endoscopic Retrograde Cholangiopancreatography Among Advanced Endoscopy Trainees: The Rapid Assessment of Trainee Endoscopy Skills (RATES) Study

    Get PDF
    Background and aims Based on the Next Accreditation System, trainee assessment should occur on a continuous basis with individualized feedback. We aimed to validate endoscopic ultrasound (EUS) and endoscopic retrograde cholangiopancreatography (ERCP) learning curves among advanced endoscopy trainees (AETs) using a large national sample of training programs and to develop a centralized database that allows assessment of performance in relation to peers. Methods ASGE recognized training programs were invited to participate and AETs were graded on ERCP and EUS exams using a validated competency assessment tool that assesses technical and cognitive competence in a continuous fashion. Grading for each skill was done using a 4-point scoring system and a comprehensive data collection and reporting system was built to create learning curves using cumulative sum analysis. Individual results and benchmarking to peers were shared with AETs and trainers quarterly. Results Of the 62 programs invited, 20 programs and 22 AETs participated in this study. At the end of training, median number of EUS and ERCP performed/AET was 300 (range 155-650) and 350 (125-500). Overall, 3786 exams were graded (EUS:1137; ERCP–biliary 2280, pancreatic 369). Learning curves for individual endpoints, and overall technical/cognitive aspects in EUS and ERCP demonstrated substantial variability and were successfully shared with all programs. The majority of trainees achieved overall technical (EUS: 82%; ERCP: 60%) and cognitive (EUS: 76%; ERCP: 100%) competence at conclusion of training. Conclusions These results demonstrate the feasibility of establishing a centralized database to report individualized learning curves and confirm the substantial variability in time to achieve competence among AETs in EUS and ERCP

    Apprentissage de représentations pour la reconnaissance visuelle

    No full text
    In this dissertation, we propose methods and data driven machine learning solutions which address and benefit from the recent overwhelming growth of digital media content.First, we consider the problem of improving the efficiency of image retrieval. We propose a coordinated local metric learning (CLML) approach which learns local Mahalanobis metrics, and integrates them in a global representation where the l2 distance can be used. This allows for data visualization in a single view, and use of efficient ` 2 -based retrieval methods. Our approach can be interpreted as learning a linear projection on top of an explicit high-dimensional embedding of a kernel. This interpretation allows for the use of existing frameworks for Mahalanobis metric learning for learning local metrics in a coordinated manner. Our experiments show that CLML improves over previous global and local metric learning approaches for the task of face retrieval.Second, we present an approach to leverage the success of CNN models forvisible spectrum face recognition to improve heterogeneous face recognition, e.g., recognition of near-infrared images from visible spectrum training images. We explore different metric learning strategies over features from the intermediate layers of the networks, to reduce the discrepancies between the different modalities. In our experiments we found that the depth of the optimal features for a given modality, is positively correlated with the domain shift between the source domain (CNN training data) and the target domain. Experimental results show the that we can use CNNs trained on visible spectrum images to obtain results that improve over the state-of-the art for heterogeneous face recognition with near-infrared images and sketches.Third, we present convolutional neural fabrics for exploring the discrete andexponentially large CNN architecture space in an efficient and systematic manner. Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyperparameters of the fabric (the number of channels and layers) are not critical for performance. The acyclic nature of the fabric allows us to use backpropagation for learning. Learning can thus efficiently configure the fabric to implement each one of exponentially many architectures and, more generally, ensembles of all of them. While scaling linearly in terms of computation and memory requirements, the fabric leverages exponentially many chain-structured architectures in parallel by massively sharing weights between them. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels datasetDans cette dissertation, nous proposons des méthodes d’apprentissage automa-tique aptes à bénéficier de la récente explosion des volumes de données digitales.Premièrement nous considérons l’amélioration de l’efficacité des méthodes derécupération d’image. Nous proposons une approche d’apprentissage de métriques locales coordonnées (Coordinated Local Metric Learning, CLML) qui apprends des métriques locales de Mahalanobis, puis les intègre dans une représentation globale où la distance l2 peut être utilisée. Ceci permet de visualiser les données avec une unique représentation 2D, et l’utilisation de méthodes de récupération efficaces basées sur la distance l2. Notre approche peut être interprétée comme l’apprentissage d’une projection linéaire de descripteurs donnés par une méthode a noyaux de grande dimension définie explictement. Cette interprétation permet d’appliquer des outils existants pour l’apprentissage de métriques de Mahalanobis à l’apprentissage de métriques locales coordonnées. Nos expériences montrent que la CLML amé-liore les résultats en matière de récupération de visage obtenues par les approches classiques d’apprentissage de métriques locales et globales.Deuxièmement, nous présentons une approche exploitant les modèles de ré-seaux neuronaux convolutionnels (CNN) pour la reconnaissance faciale dans lespectre visible. L’objectif est l’amélioration de la reconnaissance faciale hétérogène, c’est à dire la reconnaissance faciale à partir d’images infra-rouges avec des images d’entraînement dans le spectre visible. Nous explorerons différentes stratégies d’apprentissage de métriques locales à partir des couches intermédiaires d’un CNN, afin de faire le rapprochement entre des images de sources différentes. Dans nos expériences, la profondeur de la couche optimale pour une tâche donnée est positivement corrélée avec le changement entre le domaine source (données d’entraînement du CNN) et le domaine cible. Les résultats montrent que nous pouvons utiliser des CNN entraînés sur des images du spectre visible pour obtenir des résultats meilleurs que l’état de l’art pour la reconnaissance faciale hétérogène (images et dessins quasi-infrarouges).Troisièmement, nous présentons les "tissus de neurones convolutionnels" (Convolutional Neural Fabrics) permettant l’exploration de l’espace discret et exponentiellement large des architectures possibles de réseaux neuronaux, de manière efficiente et systématique. Au lieu de chercher à sélectionner une seule architecture optimale, nous proposons d’utiliser un "tissu" d’architectures combinant un nombre exponentiel d’architectures en une seule. Le tissu est une représentation 3D connectant les sorties de CNNs à différentes couches, échelles et canaux avec un motif de connectivité locale, homogène et creux. Les seuls hyper-paramètres du tissu (le nombre de canaux et de couches) ne sont pas critiques pour la performance. La nature acyclique du tissu nous permet d’utiliser la rétro-propagation du gradient durant la phase d’apprentissage. De manière automatique, nous pouvons donc configurer le tissu de manière à implémenter l’ensemble de toutes les architectures possibles (un nombre exponentiel) et, plus généralement, des ensembles (combinaisons) de ces modèles. La complexité de calcul et de taille mémoire du tissu évoluent de manière linéaire alors qu’il permet d’exploiter un nombre exponentiel d’architectures en parallèle, en partageant les paramètres entre architectures. Nous présentons des résultats à l’état de l’art pour la classification d’images sur le jeu de données MNIST et CIFAR10, et pour la segmentation sémantique sur le jeu de données Part Labels

    Learning representations for visual recognition

    No full text
    Dans cette dissertation, nous proposons des méthodes d’apprentissage automa-tique aptes à bénéficier de la récente explosion des volumes de données digitales.Premièrement nous considérons l’amélioration de l’efficacité des méthodes derécupération d’image. Nous proposons une approche d’apprentissage de métriques locales coordonnées (Coordinated Local Metric Learning, CLML) qui apprends des métriques locales de Mahalanobis, puis les intègre dans une représentation globale où la distance l2 peut être utilisée. Ceci permet de visualiser les données avec une unique représentation 2D, et l’utilisation de méthodes de récupération efficaces basées sur la distance l2. Notre approche peut être interprétée comme l’apprentissage d’une projection linéaire de descripteurs donnés par une méthode a noyaux de grande dimension définie explictement. Cette interprétation permet d’appliquer des outils existants pour l’apprentissage de métriques de Mahalanobis à l’apprentissage de métriques locales coordonnées. Nos expériences montrent que la CLML amé-liore les résultats en matière de récupération de visage obtenues par les approches classiques d’apprentissage de métriques locales et globales.Deuxièmement, nous présentons une approche exploitant les modèles de ré-seaux neuronaux convolutionnels (CNN) pour la reconnaissance faciale dans lespectre visible. L’objectif est l’amélioration de la reconnaissance faciale hétérogène, c’est à dire la reconnaissance faciale à partir d’images infra-rouges avec des images d’entraînement dans le spectre visible. Nous explorerons différentes stratégies d’apprentissage de métriques locales à partir des couches intermédiaires d’un CNN, afin de faire le rapprochement entre des images de sources différentes. Dans nos expériences, la profondeur de la couche optimale pour une tâche donnée est positivement corrélée avec le changement entre le domaine source (données d’entraînement du CNN) et le domaine cible. Les résultats montrent que nous pouvons utiliser des CNN entraînés sur des images du spectre visible pour obtenir des résultats meilleurs que l’état de l’art pour la reconnaissance faciale hétérogène (images et dessins quasi-infrarouges).Troisièmement, nous présentons les "tissus de neurones convolutionnels" (Convolutional Neural Fabrics) permettant l’exploration de l’espace discret et exponentiellement large des architectures possibles de réseaux neuronaux, de manière efficiente et systématique. Au lieu de chercher à sélectionner une seule architecture optimale, nous proposons d’utiliser un "tissu" d’architectures combinant un nombre exponentiel d’architectures en une seule. Le tissu est une représentation 3D connectant les sorties de CNNs à différentes couches, échelles et canaux avec un motif de connectivité locale, homogène et creux. Les seuls hyper-paramètres du tissu (le nombre de canaux et de couches) ne sont pas critiques pour la performance. La nature acyclique du tissu nous permet d’utiliser la rétro-propagation du gradient durant la phase d’apprentissage. De manière automatique, nous pouvons donc configurer le tissu de manière à implémenter l’ensemble de toutes les architectures possibles (un nombre exponentiel) et, plus généralement, des ensembles (combinaisons) de ces modèles. La complexité de calcul et de taille mémoire du tissu évoluent de manière linéaire alors qu’il permet d’exploiter un nombre exponentiel d’architectures en parallèle, en partageant les paramètres entre architectures. Nous présentons des résultats à l’état de l’art pour la classification d’images sur le jeu de données MNIST et CIFAR10, et pour la segmentation sémantique sur le jeu de données Part Labels.In this dissertation, we propose methods and data driven machine learning solutions which address and benefit from the recent overwhelming growth of digital media content.First, we consider the problem of improving the efficiency of image retrieval. We propose a coordinated local metric learning (CLML) approach which learns local Mahalanobis metrics, and integrates them in a global representation where the l2 distance can be used. This allows for data visualization in a single view, and use of efficient ` 2 -based retrieval methods. Our approach can be interpreted as learning a linear projection on top of an explicit high-dimensional embedding of a kernel. This interpretation allows for the use of existing frameworks for Mahalanobis metric learning for learning local metrics in a coordinated manner. Our experiments show that CLML improves over previous global and local metric learning approaches for the task of face retrieval.Second, we present an approach to leverage the success of CNN models forvisible spectrum face recognition to improve heterogeneous face recognition, e.g., recognition of near-infrared images from visible spectrum training images. We explore different metric learning strategies over features from the intermediate layers of the networks, to reduce the discrepancies between the different modalities. In our experiments we found that the depth of the optimal features for a given modality, is positively correlated with the domain shift between the source domain (CNN training data) and the target domain. Experimental results show the that we can use CNNs trained on visible spectrum images to obtain results that improve over the state-of-the art for heterogeneous face recognition with near-infrared images and sketches.Third, we present convolutional neural fabrics for exploring the discrete andexponentially large CNN architecture space in an efficient and systematic manner. Instead of aiming to select a single optimal architecture, we propose a “fabric” that embeds an exponentially large number of architectures. The fabric consists of a 3D trellis that connects response maps at different layers, scales, and channels with a sparse homogeneous local connectivity pattern. The only hyperparameters of the fabric (the number of channels and layers) are not critical for performance. The acyclic nature of the fabric allows us to use backpropagation for learning. Learning can thus efficiently configure the fabric to implement each one of exponentially many architectures and, more generally, ensembles of all of them. While scaling linearly in terms of computation and memory requirements, the fabric leverages exponentially many chain-structured architectures in parallel by massively sharing weights between them. We present benchmark results competitive with the state of the art for image classification on MNIST and CIFAR10, and for semantic segmentation on the Part Labels datase

    Utility of point-of-care ultrasound in differentiating causes of shock in resource-limited setup

    No full text
    Background: Delivering early diagnosis of shock in resource-limited setting is challenging, especially with limited availability of point-of-care laboratory and radiological diagnostic facilities. There is growing urgency to provide point-of-care diagnosis and treatment for time-sensitive condition like shock. Aims: We tried to evaluate the application of point-of-care ultrasound (Rapid Ultrasound for Shock and Hypertension [RUSH] protocol) considering different disease cohort and practice realities in our setup. Settings and Design: This study was a single-center prospective diagnostic study to check the diagnostic accuracy of point-of-care ultrasound (RUSH protocol). This study was approved by the ethics committee. Materials and Methods: The study was conducted at the emergency medicine department of a tertiary care government hospital in Central Gujarat from November 16 to October 17. All adult patients with clinical features of shock with systolic blood pressure 1 presenting to emergency department were included as participants. The results of point-of-care ultrasound (RUSH protocol) were compared with the diagnosis given by consultants of respective department as per standard departmental practices. Statistical Analysis and Results: A total of 130 patients were enrolled in this study. Mean time taken to examine by the point-of-care Ultrasound (RUSH protocol) was 12 min (range 11–14 min). Kappa index was 0.860. This protocol was able to correctly diagnose 100% of obstructive shock, 96.3% of cardiogenic shock, 94.4% of hypovolemic shock, 80.9% of mixed type of shock, and 75% of distributive type of shock. Conclusion: This study highlights the role of point-of-care ultrasound (RUSH protocol) for early diagnosis of the shock etiology in emergency medicine department. Diagnosis using point-of-care ultrasound (RUSH protocol) significantly agreed with medical diagnosis. It showed good efficacy of point-of-care ultrasound (RUSH protocol) to differentiate causes of shock with good accuracy except distributive shock
    corecore